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DETAILED ACTION 
Claim Objections -35 USC §112 

1. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

2. Claim 1 recites the limitation "a heuristic module to generate a thumbnail of the 
audio file. . .", where audio file is not mentioned prior to this limitation. It is unclear as to 
whether "audio information" is the synonymous with audio file or if there is another 
correlation between the two limiting terms. There is insufficient antecedent basis for this 
limitation in the claim. 

Claim Rejections - 35 USC § 102 

3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

4. Claims 1-4, 7, 8, 10, 14, 16-22 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Petkovic et al, US 61 85527. 

Re claim 1 , "A system for summarizing audio information", Petkovic 
teaches a system and method for summarizing the audio stream, (Abstract). 

"analyzer to convert audio into frames", Petkovic teaches rendering the audio 
streams into intervals, (Abstract). 
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"fingerprinting component to convert the frames into fingerprints", Petkovic 
teaches rendering the audio streams into intervals with each interval including one or 
more segments, (abstract), "each fingerprint based on a plurality of frames", Petkovic 
teaches the interval having one or more segments. 

"A similarity detector", Petkovic teaches the interval matching of heuristically 
predefined meta patterns, (abstract). 

"A heuristic module to generate a thumbnail of the audio file", Petkovic teaches 
the indexing of an audio stream based on the classification and pattern matching, with 
only relevant features being indexed, (abstract). 

Re claim 2, "heuristic module comprising at least one of an energy component", 
Petkovic teaches predetermined audio features for a particular range of energy, 
(abstract), "flatness component to help determine a suitable segment of audio for the 
thumbnail", Petkovic teaches a range of spectral energy concentration (abstract) as well 
as a silence test (Fig. 4) which in combination assist in labeling and matching segments 
of audio. 

Re claim 3, "heuristic module is employed to automatically select voiced 
chorused over instrumental portions", Petkovic teaches audio features determined to 
represent speech on music, (abstract). 

Re claim 4, "energy and flatness component are employed when the fingerprints 
do not result in finding a suitable chorus", Petkovic teaches that energy and energy 
concentration tests are employed unconditionally even if a segment match is not found, 
(abstract and Fig. 3). 
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Re claim 7, "analyzer computes a set of spectral magnitudes", Petkovic teaches 
Fourier magnitude spectra stored as a log frequency, (Col. 10 line 31-36). 

Re claim 8, "a mean, normalized energy E", Petkovic teaches a mean value over 
all segments where a normalized spectral energy is given by an equation, (Col. 10 line 
37-45). "Dividing a mean energy per frequency component within the frame by the 
average of that quantity over frames in an audio file", Petkovic teaches of calculations 
occurring for i th frequencies throughout each segment, (Col. 10 line 21-27). Petkovic 
also teaches the difference of the audio feature and mean value of that feature divided 
by the standard deviation of all segments relevant to the i th frequency, (Col. 10 line 37- 
45). 

Re claim 10, "flatness component employs a number added to spectral 
magnitudes", a number being added is broad and will be construed as part of the 
summation of squares of frequencies to calculate spectral energy for each frequency, 
(Col. 10 line 20-27). "Mitigate numerical problems when determining log", Petkovic 
teaches the processing of particular domain speech segments that reduce errors, (Col. 
8 line 37-41). 

Re claim 14, "clustering functions further producing sets of clusters", Petkovic 
teaches intervals that include segments and the grouping of intervals for matching, 
(abstract). 

Claim 16 has been analyzed and rejected with respect to claim 1 . Petkovic 
teaches the implementation of claim 1 stored on a computer readable medium, (Col. 3 
line 49-53). 
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Claim 17 has been analyzed and rejected with respect to claim 1. Petkovic 
teaches the limitations set forth by claim 17 within claim 1. 

Re claim 18, "plurality of audio fingerprints", Petkovic teaches one or more 
predetermined audio features dependant on intervals dependant on segments, 
(abstract). 

"Clustering the plurality of fingerprints into fingerprint clusters", Petkovic teaches 
a intervals that include segments and the grouping of intervals for matching, (abstract). 

"Creating a thumbnail based in part on the fingerprint clusters", Petkovic teaches 
the indexing of an audio stream based on the classification and pattern matching, with 
only relevant features being indexed, (abstract). The audio stream will contain groups 
of intervals containing segments identified by relevant features (intervals selected that 
create a thumbnail or an abbreviated version of a set of particular segments). 

Re claim 19, "clustering further producing one or more cluster sets, each cluster 
set comprising fingerprint clusters", Petkovic teaches rendering intervals where intervals 
include segments and the grouping of intervals for matching, (abstract). A group of 
intervals implies a plurality of intervals, where groups of intervals are construed as 
clusters. 

Re claim 20, "determining whether a cluster set has three or more fingerprint 
clusters", Petkovic teaches interval determination where intervals include segments and 
the grouping of intervals for matching, (abstract). A group of intervals implies a plurality 
of intervals, where groups of intervals are construed as clusters. 
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Re claim 21 , "based in part on a threshold", Petkovic teaches audio features to 
find out whether an associated segment equals a respective threshold to determine if 
predetermined features are exhibited, (Col. 5 line 1-9). "Help determine if two 
fingerprints belong to the same cluster set", Petkovic teaches the interval matching of 
heuristically predefined meta patterns based on the classification of intervals, (abstract). 

Re claim 22, "clustering operating by considering one fingerprint at a time", 
Petkovic teaches the incrementing of each segment of audio data", (Fig. 7). 



Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 
USPQ 459 (1966) , that are applied for establishing a background for determining 
obviousness under 35 U.S.C. 103(a) are summarized as follows: (See MPEP Ch. 
2141) 

a. Determining the scope and contents of the prior art; 

b. Ascertaining the differences between the prior art and the claims in issue; 

c. Resolving the level of ordinary skill in the pertinent art; and 

d. Evaluating evidence of secondary considerations for indicating 
obviousness or nonobviousness. 

6. Claims 5, 9, 11, 12, 23, 24, and 34 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Petkovic et al, US 6185527 in view of Wells et al (herein 



after Wells), US PGPUB 20030086341 A1. 
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Re claim 5, "a component to remove silence at the beginning and end of an 
audio clip via an energy-based threshold", Petkovic teaches silence test (Petkovic Fig. 
4) correlated to frequency dependent spectral energy concentration applied to a 
heuristic threshold, (Petkovic Col. 10 line 37-53). However, Petkovic fails to teach the 
removal of silence at the "beginning and end" of an audio clip. Wells teaches the 
discarding of time information in the beginning and end of the time sample to minimize 
distortion, (Wells [0164]). Therefore, the combined teaching of Petkovic and Wells 
would have rendered obvious a component to remove silence at the start and finish of 
an audio sample through an energy-based threshold. 

Re claim 9, "a component that selects a middle portion of an audio file", Petkovic 
teaches of audio information where a group of intervals are selected for testing, 
(Petkovic abstract). A middle portion of an audio file is broad and construed as any 
portion within the audio stream. "Mitigate quiet introduction and fades appearing in the 
audio file", Petkovic teaches of silence occurring within an audio stream but fails to 
teach of avoidance of fades and silent introductions during the segmentation process. 
Wells teaches of stripping away silence as part of the task of increasing the robustness 
of the fingerprint, (Wells [0100]). Wells also teaches the editing of fades within an audio 
stream, (Wells [0057]). Therefore, the combined teaching of Petkovic and Wells would 
have rendered obvious a component to reduce a quiet introduction and fades in an 
audio file. 

Re claim 1 1 , "frame quantity computed as a log normalized geometric mean of 
spectral magnitudes", Petkovic teaches multiple segments within an interval (Petkovic 
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abstract) as well as magnitude spectra stored as a log frequency, (Col. 10 line 31-36). 
However Petkovic fails to teach a geometric mean included within the flatness 
component. Wells teaches a geometric mean with equivalency to an arithmetic mean in 
the log domain, (Wells [0175]). Therefore, the combined teaching of Petkovic and Wells 
would have rendered obvious a flatness component including a geometric mean 
function. 

Re claim 12, "subtracting a per-frame log arithmetic mean of a per-frame 
magnitude from the geometric mean", Petkovic teaches normalization, (Col. 10 line 37- 
45). However Petkovic fails to teach the difference of geometric and arithmetic means. 
Wells teaches the difference of an arithmetic mean and a geometric mean, (Wells 
[0217] and following equation). Therefore, the combined teaching of Petkovic and Wells 
would have rendered obvious the difference of geometric and arithmetic means as part 
of a normalization process. 

Re claim 23, "determining a parameter (D) describing how closely spread 
clusters are, temporally, throughout an audio file", Petkovic teaches groups of intervals 
of segments from an audio stream, (Petkovic abstract). However Petkovic fails to teach 
of the determination of how evenly spread clusters are. Wells teaches the values within 
fingerprints are a set based on the observed spread of those values across all songs in 
the sample set, (Wells [0231]). By knowing the spread, one can determine how even 
the spread is. Therefore, the combined teaching of Petkovic and Wells would have 
rendered obvious a parameter that determines the spread of the group of intervals 
(clusters) in an audio stream. 
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Claim 24 has been analyzed and rejected with respect to claim 23. A temporal 
spread is broad as to be construed as a "spread" within a cluster as the one described 
in claim 23. 

Re claim 34, "automatically fading a beginning or an end of an audio thumbnail", 
Petkovic teach of relevant features within an interval that give a unique identity. 
However Petkovic fails to teach fading at the beginning and end of an audio stream. 
Wells teaches of stripping away silence as part of the task of increasing the robustness 
of the fingerprint, (Wells [0100]). Wells also teaches the editing of fades within an audio 
stream, (Wells [0057]). Therefore, the combined teaching of Petkovic and Wells would 
have rendered obvious fading at the beginning or end of an interval. 

7. Claims 6, 13, 15, and 25-30 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Petkovic et al, US 6185527 in view of Nichogi et al., (herein 
after Nichogi), US PGPUB 20030021472 A1. 

Re claim 6, "Average Euclidean distance from each fingerprint to other 
fingerprints for an audio clip is one", Petkovic teaches normalization (Col. 4 line 62-67) 
but fails to teach of normalization through the use of a Euclidean distance. The use of a 
normalization value of one is broad. When read in light of the specification, any 
constant value can be used to evenly space fingerprints. Petkovic teaches speech 
intervals that utilize the use of "one" as a constant, (Col. 4 line 52-55). Nichogi teaches 
the use of a Euclidean distance, (Nichogi [0078]). Therefore, the combined teaching of 
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Petkovic and Nichogi would have rendered obvious a fingerprint component that uses 
normalization where a Euclidean distance has a value of one. 

Re claim 13, "a clustering function producing clusters of similar functions", 
Petkovic teaches groups of intervals and matching of intervals that have similarities, 
(Petkovic abstract). However Petkovic fails to teach a clustering function. Nichogi 
teaches an operation where the determination, fixing, and storing of clusters is 
performed, (Nichogi Fig. 3). Therefore, the combined teaching of Petkovic and Nichogi 
would have rendered obvious a clustering function producing clusters with similarities. 

Re claim 15, "normalized Euclidean distance from F1 to F2 below a first 
threshold", Petkovic teaches normalization (Col. 4 line 62-67) but fails to teach of 
normalization through the use of a Euclidean distance. Petkovic teaches audio features 
to find out whether an associated segment equals a respective threshold to determine if 
predetermined features are exhibited, (Col. 5 line 1-9). However Petkovic fails to teach 
two conditions of being above or below a threshold. Nichogi teaches a Euclidean 
distance between pixels as well the condition when a distance is greater that a 
threshold value, (Nichogi Fig. 21). "A temporal gap in an audio between where F1 is 
computed and where F2 is computed is above a second threshold", Petkovic teaches a 
silence test where segments will be passed through one at a time (Petkovic Fig. 4). 
There will inevitably be a time gap between each segment while the code executes. 
However Petkovic fails to teach the second condition of the distance being below a 
threshold value. Nichogi teaches a condition when a distance is smaller than a 
threshold value, where quantized vectors are effected, (Nichogi Fig. 21). Therefore, the 
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combined teaching of Petkovic and Nichogi would have rendered obvious a Euclidean 
distance component that evaluates the distance between two fingerprints to be above or 
below a certain threshold. 

Re claim 25, "normalizing a song to have duration of 1", when read in light of the 
specification, the quantity t-sub-l minus t-sub-(i-1 ) is a probability. Therefore the 
constant value of 1 used in the subtraction operation in the equation is implied since a 
probability must remain at greatest value "1 Petkovic teaches the use of a confidence 
level probability, (Col. 8 line 54-55). 

"Setting a time position of an i'th cluster be t-sub-l", Petkovic teaches of 
calculations occurring for i th frequencies throughout each segment, (Col. 10 line 21-27). 

Petkovic teaches of groups of intervals, (Petkovic, abstract). The equation in 
claim 25 reveals a sum of squares where the squared term refers to a probability. This 
summation is the square of a Euclidean distance. Petkovic fails to teach a Euclidean 
distance. However Nichogi teaches a Euclidean distance, (Nichogi [0078] and Equation 
3 [0089]). Subtracting the squared Euclidean from 1 still produces a probability. The 
equation in claim 25 is the equation of Nichogi's ([0089]) adjusted to be a probability. 
Therefore, the combined teaching of Petkovic and Nichogi would have rendered 
obvious a parameter measured through the use of a probability-based squared 
Euclidean equation. 

Re claim 26, "an offset and scaling factor", Petkovic teaches the use of weighting 
to select only specific portions of the speech, (Petkovic Col. 3 line 18-24). "maximum 
value of 1 and minimum value of 0", The combined teaching of Petkovic and Nichogi 
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teaches the use of a confidence level probability, (Col. 8 line 54-55). The combined 
teaching of Petkovic and Nichogi also teaches the use data being given the range (0,1) 
during processing, (Petkovic Col. 10 line 1-7). 

Re claim 27, "determining a mean spectral quality for fingerprints in a set", The 
combined teaching of Petkovic and Nichogi teaches Fourier magnitude spectra stored 
as a log frequency, (Petkovic Col. 10 line 31-36). 

Re claim 28, "spectral flatness" and a "parameter D, are combined", the 
combined teaching teaches a silence test (Petkovic Fig. 4) correlated to frequency 
dependent spectral energy concentration applied to a heuristic threshold, (Petkovic Col. 
10 line 37-53). "Determine a best cluster set from among a plurality of cluster sets", the 
combined teaching teaches that the most relevant data features of a segment are 
indexed, (Petkovic abstract). 

Claim 29 has been analyzed and rejected with respect to claim 28. Claim 29 
teaches the limitations set forth in claim 28, where the external value of the parameter is 
synonymous with the parameter itself. 

Re claim 30, "best fingerprint within the cluster is determined", the combined 
teaching teaches that the most relevant data features of a segment are indexed, 
(Petkovic abstract). "Surrounding audio, of duration about equal to a duration of an 
audio thumbnail", surrounding audio is broad and is construed as audio within the 
interval with similar features, where the combined teaching teaches of the grouping of 
intervals and pattern matching them, (Petkovic abstract). "Spectral energy and 
flatness", the combined teaching teaches predetermined audio features for a particular 
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range of energy, (abstract). The combined teaching also teaches a range of spectral 
energy concentration (abstract) as well as a silence test (Fig. 4) which in combination 
assist in labeling and matching segments of audio. 

8. Claim 31 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Petkovic et al, US 6185527 in view of Foote, US 6542869. 

Re claim 31 , "a longest section of audio within an audio file that repeats in the 
audio file", Petkovic teaches an audio stream and intervals containing segments, all 
extracted from the audio stream, (Petkovic abstract). However Petkovic fails to teach a 
longest section of audio repeating. Foote teaches points of change in music and 
segment boundaries such as a chorus. The longest repeating section of audio, 
particularly music, implies a chorus. Therefore, the combined teaching of Petkovic and 
Foote would have rendered obvious the determination of a chorus or longest repeating 
segment of audio from an audio stream. 

10. Claim 32 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Petkovic et al, US 6185527 in view of Nichogi et al., (herein after Nichogi), US 
PGPUB 20030021472 A1 and further in view of Wells et al (herein after Wells), US 
PGPUB 20030086341 A1. 

Re claim 32, "rejecting clusters that are close to the beginning or end of a song", 
Petkovic teaches groups of intervals but fails to teach the removal of silence at the 
"beginning and end" of an audio clip. Wells teaches the discarding of time information 
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in the beginning and end of the time sample to minimize distortion, (Wells [0164]). 
Therefore, the combined teaching of Petkovic and Wells would have rendered obvious 
the removal of intervals close to the beginning and end of an audio stream. 

"which energy falls below a threshold for any fingerprint in a predetermined 
window", Petkovic teaches the segmentation of audio data into intervals (Petkovic 
abstract). Petkovic also teaches predetermined audio features for a particular range of 
energy, (abstract). However Petkovic fails to teach two conditions of being above or 
below a threshold. Nichogi teaches a Euclidean distance between pixels as well the 
condition when a distance is greater that a threshold value, (Nichogi Fig. 21). 
Therefore, the combined teaching of Petkovic and Nichogi would have rendered 
obvious the rejection of clusters when energy levels fall below a threshold. 

"selecting a fingerprint having a highest average spectral flatness", Petkovic 
teaches a range of spectral energy concentration (abstract) as well as a silence test 
(Fig. 4). 

9. Claims 33, 35, and 36 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Petkovic et al, US 6185527 in view of Kanevsky et al, US 
6434520 (herein after Kanevsky). 

Re claim 33, "generating a thumbnail by specifying time offsets", Petkovic 
teaches the indexing of an audio stream based on the classification and pattern 
matching, with only relevant features being indexed, (abstract). However Petkovic fails 
to teach time offsets within an audio stream. Kanvesky teaches^of windows within a 



Application/Control Number: 10/785,560 Page 15 

Art Unit: 2609 

stream of vectors that are shifted over time, (Kanevsky, Col. 4 line 4-1 5). Therefore, 
the combined teaching of Petkovic and Kanevsky would have rendered obvious a time 
offset or time shift within the audio stream. 

Re claim 35, "based on a log spectrum computed over a small window", Petkovic 
teaches Fourier magnitude spectra stored as a log frequency, (Col. 10 line 31-36). 
"processing an audio file in at least two layers", Petkovic fails to teach two layers. 
Kanevsky teaches two adjacent sliding windows, (Kanevsky Col. 4-28). "second layer 
operates on a vector computed by aggregating vectors produced by the first layer", 
Petkovic fails to teach this limitation. Kanevsky teaches two adjacent sliding windows 
operating on the stream of vectors where the feature vectors of each window are 
clustered, (Kanevsky Col. 4-28). Therefore, the combined teaching of Petkovic and 
Kanevsky would have rendered obvious two windows or layers that add all vectors 
produced. 

Re claim 36, "providing a wider temporal window in a subsequent layer than a 
proceeding layer", the combined teaching teaches alternatives for longer terms 
generated by the speech recognition engine, (Petkovic abstract). A longer term implies 
a longer window in time to extract,segments of audio into intervals. However Petkovic 
fails to teach of two layers within an audio file. Kanevsky teaches two adjacent sliding 
windows, (Kanevsky Col. 4-28). Therefore, the combined teaching of Petkovic and 
Kanevsky would have rendered obvious a wider temporal window within the range of 
two layers or windows. 
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11. Claim 37 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Petkovic et al, US 6185527 in view of Kanevsky et al, US 6434520 (herein after 
Kanevsky) and further in view of Wells et al (herein after Wells), US PGPUB 
20030086341 A1. 

Re claim 37, "at least one of the layers to compensate for time misalignment", the 
combined teaching teaches two layers, (Kanevsky Col. 4-28). However the combined 
teaching fails to teach time misalignment compensation. Wells teaches the distance 
between fingerprints and the correlated matching where the space between fingerprints 
is partitioned into non-overlapping regions, (Wells [0192]). Therefore, the combined 
teaching of Petkovic, Kanevsky, and Wells would have rendered obvious compensation 
for time misalignment. 

Examiner's Note 

The referenced citations made in the rejection(s) above are intended to exemplify 
areas in the prior art document(s) in which the examiner believed are the most relevant 
to the claimed subject matter. However, it is incumbent upon the applicant to analyze 
the prior art document(s) in its/their entirety since other areas of the document(s) may 
be relied upon at a later time to substantiate examiner's rationale of record. A prior art 
reference must be considered in its entirety, i.e., as a whole, including portions that 
would lead away from the claimed invention. W.L. Gore & associates, Inc. v. Garlock, 
Inc. , 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert, denied, 469 U.S. 851 (1984). 
However, "the prior art's mere disclosure of more than one alternative does not 
constitute a teaching away from any of these alternatives because such disclosure does 
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not criticize, discredit, or otherwise discourage the solution claimed...." In re Fulton , 391 
F.3d 1195, 1201,73 USPQ2d 1141, 1 146 (Fed. Cir. 2004). 
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